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AGTCCCAt ;KAC(i(Kj(*l'IM"lV(X\\GA(iA(iC''l\\AAAGA(iAACJGCKX^\(:iAGAAT(;TC GTCXCAG 
5 CCV\C3C\\GC.GAACX\\GACCTCTXXX < i ( i f j G C C A C A GAG G A C T. \ C TT ' C " \ A I G G C A G ( TG G T A C 1 
ATCGA I GAGCT CCAGGGGGGO iA( iGAGC TCCAGCCAGAGGGGGAAG TGCX XTX CTGCCAC 
AC CAGCA IACC A(CCX3<XXTXnAC< V\(/G(:(T(iCX'MXiGCCTCGCT(';TCAA'l'iX''rr(.iTC3C'IG 
CTGCTXXTXiGCOYIXKTGGTGAGGC ( iCXT X X \\( i( ' TCTGGCC IX 3A( TG TCi'J GCGTGGCAGG 
CCCGGcXT ( X X CAGCCCT G IGC .A IT I X 'I 'J G( 3C TGGCX 3ACAGGCCCXGGGCAGTGCTTGCT 
10 GCTG'ITITCA TGG IXXTC XTGA< !C"I < XXTTHXiTTrCXTGCTCCCCGAOGAGGACGCATITi 

C< CTICCTGACICTX GCX TCAGt \\CCCAGCCAAGATt iGGAAAACTGAGt 3CTCCAAGAGGG 
GC CTGGAAGA'J AC TGGGACTGITC I ATTATGC IGCC< TCTACTACCTT ClGGCnX CTG T 

< .( CACGGCTGGCC ACACAGCTt X "At ACCT GCTX TiGCAGCACGCTX i 1 C CTGC KXX X'ACC TT 

* i * iGGTO \-\GdTCTt 3< iCAGAGGGCAGAGTGTCX X (\\GGTGCCC\AAGA TCT ACAAG FACTAC 
15 IX C V TGCFt XX CTCX ( T Gt C IC 1 C ( TGG It 3GGCCTXX iGATTCCT GAGC CTTTGG FACCCT 

i .TGCAGC'IGG IGAGAACX TTC AGO X ; T A G G \ ( ' A ( ; ( i A G C A G i 3 CTG C\ V A G G G ( j CT G C A G AGC ' 
AGC FACT CTGAGG AA FA I ( IX 3AC iCi AA( X T < < T ' I T G( * A G G A A ( i A A G CTG G G A A G C AGCTAC 
( ACACC I CCA AGC \ K 3G< TT C < TG ICC T ( X .GCO < 3( G I XTX J< ^ ITGAGACAC T< i( A TC TAG 
AC TCCACAGC ( AGGA FT< (Ah 1V( X GC "I < .AAGGTGGTGC ITTCAGCT At AC I GA< WGGG 
20 ACGG(X7ATTTACCAGGT< .G( X X IGC TGC TGCTGG 'IX i< K3CG FGCH ACCOAC FATCCAGAAC i 
« TGAGGGCAt Kit 3G I CAC < 'AC GGA 1 G I C I CXTACX 1 X XTGGC CGGcTTTGt 3AA IO .TGCFC 
r r(X:(jAGt3ArAA(iC\AGGACjC;i(:.G'IXiC3A(;C''I GGIX3AA<:}C\\CC , ATC''r(3 rGCX iCT C FGGAAGTG 
TGCTACA TCTCXGCXTTGG IX 'IT X i M X / I GC I TACT CACC ITtXT GGTCX TC iATGCGCTCA 
t IX 3GTGACACACAC !GA( XWAO TT < < 3 AGC 1 CT i iCAt XGAGGAGCT GC CC IX iGAC 11 GAG T 
25 CCCTTGCAlXXiGAGTCO CATX < ( T i ( C XX CAACi( .X'ATA TK TG IT GG A I GAGCT I CACiT 
( .i. CTAC* A(3A( AGC ( TI I A K 1 G< ( ITX-G( >( TX C'TCiC iTC3CA(3CAGA TCA'I C IT X IT ( C I G 

< ;< i \ A(TACG(.;cCCTCi(K CTTX X IX3GTX XT C \TGCXTX i IX iC'H X\ATG<3< "AGGAACC RXT (i 
( KTTXX < i H( X CI(i(3AG IXX T< ( > I G( iC CXTTC I CK3C rC3A(TTTG( XXX TGGCTG 1 > iA I t 

( 'IX jt.'AGAACATGGCAGCX X'A ITGCiG IXTK X T (3CiA(3AC FCA I GA I XjGA( A CCC ACAGCTG 

30 A< x;AACt X .GCGAGIXiCK T A IX.CAt i(X'A(X II 1 1 TTXTCTTCXX (X T( AA KTTVit TG(3 TG 
(,(,!( ,C< \I< it,|( iG( X\\< CTXit jt GAti IX j( I ( < T< T< T< XX C TCTACAACX tC t/A IX CACXTT 
( .( rCX AGAH j( iAt X TCAG(XT< .» IX , ( X'ACC* .At iAGCCGC C ACTCK GA< X C( GGC 1AC I AC 
At ( 3 1 AO (,AAA( T I C ITGAAt ,A TTGAA< iH At XX AG I (X "JCA I CX'At X CA IXiACAt XX TT ( ' 
K jt; i ( X X TGC IX C I GCAACK X3C ACiAGt X T ( X TA( TX'AGGAC.X A IX iGCAG* CC CC< V\(i(.JAC 

35 AGCX rCAGA( CAGGGGAGGAAGAC (iAACJc.C3A r<3CAGCTGCTV\CA< TAG ,\AAG(iA< TX C A 1X3 
GCX AAt it iGAGCT AGGCCCt X,G< XX "AGC X X it; (X iCACiGGCTCX i( T X iGC3(3 1 CTX ;< "3C< I AC ACXi 
t K ;t. IX J( ACAAt XX AAt X C I ( X AC it j I ( T I ( X CiCAACi AC ( i(3( X X T X i IT ( .( iGTG< X AA IXiCi I 
( it i V \C,< ( < T(; At XiCiCAt i* iGAACit i IX AACCCACCTGC CCATC I Ci IX it T C ,A(i( iC A IX. IT CC 
Kit X i At TA lX CH X KXX. IXX ( ( X .Gc H h C H XX'AGC'ATC'ACACX'ACiCCATGCAGCX'A 

40 (,C \< XiK X IX XX ,G \ T< ACTXi lXiti riXiGC i IX it i At it i TCT( i I CTX it A( TX3GGAGC X TX 'A(3(iACi 
(,(,». TCTt ,(TX X'AC< 'CAtT TGt.CTA IGtXiAG \( it. X AC it A( iG( X i IT ( TX ,( i AGA A A A AA ACTX i 
( . I < ii iG r I At it .( it X l i t iGT ( ( 'At it iA' .t ( At i 1 K iA< X X At i(itiCACi( C At A H C'AGC X G IX T C 
{ (T At XX It it it I ( I ( it X \ It AGCX I It , \ \G< .GC X T ( X i \ IX ,AAGC X T H ' It IX it i AAC X 'AC I 
( CA» .( C ( At K ICC AO It At .< XTTX.t iC X IT ' 'At X ,i T GT GGAAGCAGCCAAGGCACTT (TT 

45 ( At C C XX ' I CA( X: X i( X AC X ,( \M X T t I C K it it it iAC T ( iC iC X X id A AAC XIX X X ( ,< , IX X IX IX X X ' 
C I GC A* 3C it iCAt iCX C A \t ,1 C A I ( i AC K At i AC X At it > IX X X 'AC At IX iAt it T f iC X 'CAC AC IX < i A 
( rAGCCAGATA ITTTTG I ACilTTTT A Hit XT 1 ICiCit I A I IA I GA A AC iAC X i IT AG I ( i It i IT t. ' 
t X TC X WATAAACTT GTTX X TC iAt iAA A A A A \ AA A A AAA A AAAA A A AAA A A A AA AAA AAA A A 
A A A A A A A A A A A AAA A A A A A A A A A A A A A A . \ A A A 

50 



FIGURE 2 

NlSSgPAliNQTSIKlATEDYSYCiSWYIDF.PQCiGEELgPnCJIiX'PSCHTSIPPGLYIIACI-ASLS 
IIA ; I.LIJ-ANHA 7 RRRQL\YPIX/VR( iRPGLPSPVDFLACU)RPRAVPAA\'FMVLI,SSLCLLLPD 
FDALPFL TLASAPSQDGK TEAPRGAWKILGLFYYAALYYPLAACATAGHTAAHLLCjS TLS 
5 WAIILGVQVWQRAECPQVPKIYKYYSLLASLPLLLGLGFLSLWYPVQLVRSFSRRTGAGSK 
GLQSSYSEHYLRNLLCRKKLGSSYHTSKHGFLSWARVCLRHCIYTPQPGFHI.PLKLVLSA 
'rLTGTAIYQVALLLLVGVVPTIQKVRAGVTrDVSYLI.AGFGIVLSEDKQEVVELVKHHLW 
ALEVCYISALVLSCLLTFLVLMRSLVTHRTNLRALHRGAALDLSPLHRSPHPSRQAIFCW 
MSFSAYQTAFIClXiLLVQQIIFFLGTTALAFLVLMPVLHGRNLLLFRSLESSWPFWLTLA 
10 LAVILQNMAAHWVFLF/niDGHPQLTORRVLYAA'rFLLFPLNVLVGAMVA'I'WRVLLSALY'N 
AIULGQMDLSLLPPRAATLDPGYYTYRNFLKIEVSQSHPAMTAFCSEELQAQSLLPRTMA 
APQDSLRPGEEDECiMQLLQTKDSMAKGARPGASRGRARWGLAY'I'I.I.HNPTLQVFRKTALL 
GANGAQP 

Important features of the protein: 
1 5 Signal peptide: 

None 

Transmembrane domain: 

20 

54 69 
102-11^ 
148-160 
207-222 
25 301-320 
364-380 
431-451 
474-489 
560-535 

30 

Motif file: 

Motif name: N-glyeosylation site. 
8-12 

35 

Motif name: N-myristoylation site. 

50-56 

176-182 
40 241-247 

317-323 

341-347 

525-531 

627-033 
45 031-037 

640-046 

001-067 

Motif name: Prokarvotic membrane lipoprotein lipid attachment site. 

50 

364-375 

Motif name: A'FP < i I P htmlin^ site motif A ( P-loopl. 

5 5 1 ^ Mm 





FIGURE 3A 



Comparison Protein 



PRO 



XXXXXYYYYYYY 



XXXXXXXXXXXXXXX 



(Length = 15 amino acids) 
(Length = 12 amino acids) 



% amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid 
10 residues of the PRO polypeptide) = 

5 divided by 15 - 33.3% 





FIGURE 3B 



Comparison Protein 



PRO 



XXXXXYYYYYYZZYZ 



XXXXXXXXXX 



(Length = 10 amino acids) 
(Length = 15 amino acids) 



5 

% amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid 
10 residues of the PRO polypeptide) = 

5 divided by 10 = 50% 




FIGURE 3C 

PRO-DNA NNNNNNNNNNNNNN (Length - 14 

nucleotides) 

5 Comparison DNA NNNNNNLLLLLLLLLL (Length - 16 

nucleotides) 

% nucleic acid sequence identity = 

10 (the number of identically matching nucleotides between the two nucleic acid sequences 
as determined by ALIGN-2) divided by (the total number of nucleotides of the PRO- 
DNA nucleic acid sequence) = 

6 divided by 14 - 42.9% 

15 
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PRO-DNA 



NNNNNNNNNNNN 



(Length = 12 nucleotides) 



Comparison DNA 



NNNNLLLVV 



(Length 



9 



5 nucleotides) 

% nucleic acid sequence identity = 

(the number of identically matching nucleotides between the two nucleic acid sequences 
10 as determined by ALIGN-2) divided by (the total number of nucleotides of the PRO- 
DNA nucleic acid sequence) = 

4 divided by 12 - 33.3% 
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10 



15 



20 



30 



35 



40 



* C C increased from 12 to 15 

* Z is average of FQ 

* B is average of NI) 

* match with Mop is M; slop stop ■■ 0; J (joker) match - 0 

V 

Idefine M -8 /* value of a match with a stop */ 



int 

/* a' 
/* A */ 
,* B */ 
C */ 
/* D */ 
/* E */ 
/* F */ 
/* C. */ 
/* H */ 
/* I */ 
/* J */ 
/* K */ 
/* 1. */ 
/* M */ 
/* N */ 
/* O */ 
()._M._M 
,'* P */ 
/* Q */ 
/* R */ 
/* S */ 
/* T */ 
/* U */ 
/* V */ 
/* W */ 
/* X */ 
/* Y */ 
/* Z */ 



day|26][26] 

"b'c d f f 

{ 2, 0,-2, 0, 
{ 0, 3,-4, 3, 
{-2,-4,15.-5. 
{ 0. 3.-5, 4, 
{ 0. 2,-5, 3, 
{-4,-5,-4,-6, 
{ 1. 0 -3. 1. 
{1, 1,-3, 1, 
{-1,-2.-2,-2, 
{ 0, 0. 0, 0. 
{-1. 0,-5, 0, 
{-2.-3.-6,-4, 
{-1,-2.-5,-3. 
{ 0. 2, -4, 2, 

M, M._M, 
"{ K-l.-3.-l"; 
{ 0. 1,-5, 2, 
{ 2, 0. 4,1, 
{ 1,0. 0, 0. 
{ 1, 0,-2, 0. 
{ 0. 0, 0, 0, 
{ 0,-2,-2,-2, 
{-6.-5.-8.-7. 
{ 0. 0, 0. 0, 
{-3.-3, 0.-4, 
{ 0, 1.-5. 2, 



" { 

G H I J K L M N O 

0,-4, 1,-1,-1, 0,-1,-2,-1, 0. 

2, -5, 0, 1 .-2, 0, 0,-3,-2. 2, 
-5, -4,-3,-3,-2, 0,-5.-6,-5,-4 

3, -6, 1, 1,-2, 0, 0.-4.-3. 2,_ 

4, -5, 0. 1,-2. 0, 0,-3,-2, 1, 
-5, 9,-5,-2. 1, 0,-5, 2, 0,-4, 

0. -5. 5.-2.-3. 0.-2,-4,-3, 0, 

1, -2,-2, 6,-2, 0. 0,-2,-2, 2~ 
-2, 1,-3.-2, 5, 0.-2, 2. 2,-2, 
0. 0, 0. 0. 0. 0. 0, 0, 0. 0,_ 

0. -5.-2, 0.-2. 0. 5,-3, 0. 1," 
-3. 2.-4,-2. 2, 0,-3. 6. 4.-3." 
-2. 0.-3,-2, 2, 0. 0, 4. 6,-2, 

1, -4, 0, 2,-2, 0, 1,-3,-2, 2, 



M, 

-1,- 
2,-5, 
-I. 4 
0.-3, 
0,-3, 
0, 0, 

2. -1 
-7. 0 
0, 0, 
-4, 7 

3. -5, 



1. M. M. _M.__M}, 
- 1~ 0,-2, (),"-l, -3,-2.-1, 
-1, 3,-2, 0, 1,-2,-1, 1. 
-3, 2,-2, 0, 3,-3, 0, 0,' 
1,-1,-1, 0, 0,-3,-2, 1, 
0,-1, 0, 0, 0.-1,-1, 0~ 
0. 0, 0. 0, 0, 0. 0, 0, 
-1.-2. 4, 0,-2, 2, 2,-2," 
-7,-3,-5, 0,-3.-2.-4.-4, 
0, 0, 0, 0, 0, 0, 0, 0, 
-5, 0. 1, 0, 4, 1,-2,-2. 
0, 2,-2, 0. 0,-2,-1, 1, 



P Q R S T U V W X Y Z */ 

M. 1, 0,-2, 1, 1, 0, 0,-6, 0.-3, ()}. 

M,-l, 1.0, 0, 0, 0,-2,-5, 0,-3, 1}, 
;_M,-3.-5,-4. 0,-2, 0.-2.-8. 0. 0,-5}, 

M,-l, 2.-1, 0, 0. 0,-2,-7, 0,-4, 2}, 

M,-l, 2,-1, 0, 0, 0,-2,-7. 0,-4, 3}. 
>1,-5,-5,-4,-3,-3. 0,-1, 0, 0, 7,-5}, 
~M, -1.-1.-3, 1, 0, 0,-1,-7, 0,-5, 0}. 

M. 0. 3, 2,-1,-1. 0.-2,-3, 0, 0, 2}, 
~M, -2, -2, -2,-1. 0, 0. 4,-5, 0,-1,-2}, 
M, 0. 0, 0, 0. 0, 0, 0, 0, 0, 0, 0}, 

M.-l, 1. 3, 0, 0, 0,-2,-3, 0,-4, 0}, 

M. -3, -2, -3. -3,-1, 0, 2,-2, 0,-1,-2}. 
~M, -2, -I. 0,-2,-1. 0, 2,-4. 0,-2,-1}. 
"M.-I, 1, 0. 1,0, 0,-2,-4, 0,-2, 1}, 

{ M, M, M, M, M, M, M, M. M. M, M, M, M. M, 

M, 6, 0, 0, 1, 0. 0,-1, 6, 0,-5. 0}. 

M, 0, 4, 1,-1.-1, 0,-2.-5. 0,-4, 3}. 
"M, 0, 1, 6, 0,-1, 0.-2, 2. 0,-4, 0}, 
~M. 1,-1, 0, 2, 1, 0,-1.-2, 0,-3. ()}, 

M, 0,-1.-1, 1, 3, 0, 0,-5. 0,-3, ()}, 
M, 0, 0, 0, 0, 0, 0. 0, 0, 0, 0, 0}, 

M, -1,-2,-2,-1, 0, 0, 4,-6, 0,-2,-2}. 

M,-6,-5, 2,-2,-5, 0.-6,17, 0. 0,-6}, 
M, 0, 0, 0. 0, 0, 0, 0, 0, 0, 0, 0}, 

M,-5,-4.-4,-3.-3, 0,-2, 0. 0.10.-4}, 

M, 0, 3, 0, 0, 0. 0,-2,-6, 0,-4, 4} 



45 



50 



FIGURE 4B 





^include 


< stdio.h > 






> 


^include 


< ctype.h > 








#diTme 


MAX J MP 


16 


/* max jumps in a diag */ 




^define 


MAXGAP 


24 


/* don't continue to penalize gaps larger than this */ 




^define 


J MPS 


1024 


/* max jmps in an path */ 


10 


^define 


MX 


4 


/* save if there's at least MX - 1 bases since last jmp */ 




^define 


DMAT 


3 


/* value of matching bases */ 




^define 


DMIS 


0 


/* penalty for mismatched bases */ 




^define 


DINSO 


8 


/* penalty for a gap */ 


15 


^define 


DINS1 


1 


/* penalty per base */ 




^define 


PINSO 


8 


/* penalty for a gap */ 




# define 


PINS1 


4 


/* penalty per residue */ 




Unic! join { 







20 



short nlMAXJMP]; /* size of jmp (nog for dely) */ 

unsigned short \[MAXJMPJ; /* base no. of jmp in seq x */ 

/* limits seq to 2" 16 - 1 */ 



25 



struct diag { 
int 
Ions 
short 
struct jmp 

1; 



struct path { 
int 

short 
int 



score; 
offset; 
ijmp; 
ip; 



* score at last jmp */ 

k offset of prev block */ 

* current jmp index */ 
Y list of jmps */ 



spc; /'* number of leading spaces */ 

n| J MPS];/* size of jmp (gap) */ 

x|JMPS];>* loc of imp (last elem before gap) */ 



40 



45 



so 



char 

char 

char 

char 

int 

int 

int 

int 

int 

int 

int 

int 

int 

long 

struct 

struct 

char 
char 



diag 
path 



♦ofile; 

*namex[2); 

+ prog; 

*seqxf2]; 

ihnax; 

drnaxO; 

dna; 

endgaps; 
gapx. gapy, 
IcnO. lenl . 
ngapx, ngap> . 
smax; 
* \bm; 
offset; 
M\; 



/* output file name */ 

/* seq names: getscqsO */ 

<■ * prog name for err msgs */ 

/ * seqs; getseqs( ) */ 

/* t>est diag: nwO */ 

/* final diag */ 

/* set if dna: mam( ) */ 

* set if penalizing end gaps */ 
■■ * ii>(al gaps in seqs */ 

* \eq lens */ 

* Mai si/e of gaps */ 
' * max score; mv( ) V 

* bitmap for matching * 1 

* current offset in jmp file * 

* holds diagonals *■' 

* holds path for ^eqs */" 



VallocO. *malloc(). *mdexO. ♦strcpyt); 
* get seq ( i. *g called 
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/* Needleinan- Wunsch alignment program 



* usage: progs filel file2 

5 * where file! and file 2 are two dna or two protein sequences. 

* The sequences can be in upper- or lower-case an may contain ambiguity 

* Any lines beginning with ';'.'>' or < ' are ignored 

* Max file length is 65535 (limited by unsigned short x in the jmp struct) 

* A sequence with 1/3 or more or its elements ACCjTU is assumed to be UNA 
1 0 * Output is in the file 'align out" 

* The program may create a imp file in /Imp to hold info about (raccback. 

* Original version developed under BSD 4 3 cm a vax 8650 
V 

1 5 ^include "nw.h" 
^include 'day.h" 

static _dhval[26| = { 

i 11 -> \t (\ (\ a i i n n i t m * IS n fi O S 6 O 7 Q 0 1(1 II 



20 }; 



static pbval[26] - { 

1, 2|(1< <ri)'-'A')>|<l < < ('N'-'A')), 4, 8, 16, 32, 64, 

128. 256, OxHTTTTT, 1 < < 10. 1< < 11, 1< < 12, 1< < 13, 1 < < 14. 

1< < 15. 1 < < 16, 1 < < 17, 1 < < 18, 1< < 19, 1 < <20, I< <21. 1< <22. 

1 < <23. 1 < <24, 1 < <25|( 1 < <( , E'-'A , ))|(1 < <t , Q'- , A')) 



rnain(ac. av) main 
30 int ac; 

char *av[|; 

i 

prog ^ av[0]; 
if (ac 3) { 

35 fprintbstderr, "usage: f '(s filel file2-n", prog); 

fprinll(sulerr," where filel and filc2 are two dna or two protein sequences. \n" >; 

fprmtfetderr," The sequences can be in upper- or lower -case\n" ); 

fprinth stderr. " Any lines beginning with ';' or ' < ' are ignored'm" K 

fprintfctderr. "Output is in the file \"ahgn.out\" 1 n" ); 
40 exit! It; 

\ 

namex[0] ■= a\ [ 1 ], 
namex[ I j -- av(2J, 
cqx[0| -- getseql namex{0j, itlenO); 
45 t q \ [ 1 1 getseql name\| 1 |. Men! ). 

\bm (dna \'! dtn.il : _pb\ al; 

.■ndeaps n , * 1 to penalize endgaps * ' 

■ •tile "align out". * output file * 

5< I 

nw( i; * tdl in the matrix, get the possible imps 

readjmpM K * get the actual jtnps */ 

printO; '* print st.it 1 .. alignment *'' 



i leanupitM: * unlink any tmp files *< 



20 
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* do the alignment, return Km score: inaiiK I 

* dna: \alues in Inch and Smith. PNAS. SO, 1 3S2- 1 38^, 1 ( W 

* pro: PAM 250 mi lues 

* When scores .ire equal, we prefer mismatches to any gap, prefer 

* a new gap to extending an ongoing gap, and prefer a gap in seqx 

* to a gap in seq y. 



nwO 

10 { 



nw 



char 


+ px, *py; 


/'* seqs and ptrs */ 


int 


+ ndely, *dely; 


/* keep truck of dely */ 


int 


[idelx. delx; 


/* keep track of del x */ 


int 


+ tmp; 


/* for swapping rowO, row 1 * 


int 


mis; 


/ * score for each type */ 


int 


insO, i ns 1 ; 


/* insertion penalties */ 


register 


id; 


diagonal index */ 


register 


'.»• 


/* imp index */ 


register 


VolO, *coll; 


/ * score for curr, last row */ 


register 


xx, yy; 


/* index into scqs */ 



dx (struct diag * )g calloc("to get diags", lent) i lenl t 1. sizeof( struct diag)); 



ndelv ^ (int + )g calloel "to get ndety", lenl + 1, sizeof(int)); 
25 dely = (int * )g_calloc("to get dely". lenl + 1, sizeof(int>>; 

colO = (int * )g_calloc("H> get coKT, lenl + 1, sizeoftint >>; 
col 1 - (int *)g_caMocC'to get coll", lenl f 1, sizeoftint)); 
insO - (dna)? DIN St) : P1NS0; 
insl ~ (dna)'? DINS1 : PINS1; 

30 

smax -KHKK); 
if (endgaps) { 

for <colO[l'| - dely[{)) msO, yy - 1; yy < lenl; yy f f > { 
i olO[yyJ dely[yy| - colO[yy 1| insl; 
^5 inlel\[Yv{ - vv; 

} 

coK>[0| - 0; /+ Waterman Bull Math Biol 84 */ 

} 

else 

40 for (yy I; yy < lenl, yy f + ) 

dely | yy ) insO; 

/+ fill in match matrix 

*/ 

45 foripx scq\|i>|. w l:x\'- lend; px + 1 . \x + •>{ 

* miliali/c tiM entr\ m col 

if (eiidgapo { 

if i x\ h 

50 u.| ] [<>| delx < insO i ins I t. 

else 



oll|(i) - delx ■ lo!0|I)| in^l; 



iklelx 



else { 



u'llli'l M ; 

delx MW"; 

ndelx n. 



FIGURE 4E 

...H\v 

for (py - scq\|l|, yv 1; yy < - lenl; py + * . yy f 1 > { 
im s eolO|vy-l); 
5 if Ulna) 

mis \ - (xhm^p.v'A'IAxbmrpy-'A'I^^MAT : DMIS; 

else 

mts t dav|*p\- ',\'|[*py 'A'(; 

10 * update penalty for del in x seq; 

* fa\or new del over ongong del 

* ignore MAXGAP it weight nig endgaps 
*/ 

if (endgaps | | ndelylyy] < MAXCAP) { 
15 if (u)]0[yy| - insO > - dely[yy|) { 

dely[y>l - eo!0|yv) - (insO f ins! }; 
ndelv|\v| - 1: 

j else { 

delv[vv} - i ns 1 ; 

20 njJlyliyi i ! ; 

) 

} else { 

if (coI0|yyl - (insO + nisi) > - dely(yy]> { 

dely[yyj - colOJyyl - (insO + ins I ), 
25 ndelylyy] 1: 



40 



else 

ndely[yyl + \ ; 



} 



30 /* update penalty lor del in y seq; 

* favor new del over ongong del 



*/ 

if (endgaps | j ndelx < MAX<iAP> { 

if <eoM fyy-1 1 - insO > - delx) { 

delx •— cull[yy-l] - (msO + insl); 
ndelx - 1: 

} else { 

delx - = ins I ; 
ndelx + 4 ; 



} Hse { 

if (eol 1 |yy- 1 | - < insO + ins I ) • delx) { 

delx - coIl[yy-l| (msOM msM; 
ndelx -= 1. 

45 } else 

ndelx i + : 



* pk k [lie maximum 'cure: we're tavoimi.' 

* mjs h - v a an\ del and del \ t <\ n deh 



FIG URE 4F 

...inv 

id w - yy + I en 1 - 1 ; 
if (mis :■ -- dclx iVcV: mis > dclv[Yyj> 
5 col 1 [yy| - nns; 

else if (dclx > dc]y[yy]) { 

co!lfyy| - delx; 

i| --= dx[id|.ijmp; 

if (dx|td).jp.n[()| I'dna | | (ndclx > - MAXJMP 
10 xx > dx[id].jp.x|ij| + MX> | | mis > Jx[idJ .score \ D1NS0)) { 

dx[idj. i|mp I f ; 
iff f f ij > ■= MAXJMP) { 
writejmpstid); 
ij -- dx[id).ijmp - 0; 

15 dx[ id] offset offset; 

off-ct + ^ sizeof( struct jmpi + sizeoffof f set ); 

I 

} 

dx[id], ip.n[i)l = ndelx; 
20 dxhdl lP.xlitl • xx; 

d x [ ni | score - delx, 



35 



} 



col !{>)! dely [>y]; 
i| dx(idl_ ijnip; 



if (dx[idj.)p.nt01 && (Una j| (ndelylyyj > - MAXJMP 

xx > dx|id].|p.x( ij| + MX ) || mis > dx[id j score 4 DINSOi) { 
dx( id] ijmp f t ; 
iff f * ij :> MAXJMPi { 
wr itejmps(id); 
il dx|u)| i|Uip " 0; 
dx[ id], offset - offset; 

offset + - sizeoffstruct imp) + sizeofl offset); 



} 

dx[id} ip.n(i|] --■ -ndel>[yy|; 
dx[id] ij).x(ij] xx ; 
dx[id| .score - deh|\v]; 

40 } 

if ixx ~ lenO yy * lenl > { 
* last col 

if (ciuk'aps ) 

45 ^'(>llf>'>'l - ms0 + msl + <ieiit-y> ); 

if (coll|y\ 1 smax) { 

Miiax u'l]K>l. 
dmax id; 



if i endijap^ xx •-. kit 1 o 

col 1 |y y- I j - - i nsi* * in 1 1 Men' i w ); 
if I col 1 [yy - 1 1 - sinax) { 

sina\ col 1 [>■> 1 ]; 

dmax id; 



Imp cold; cold C"'l 1 . i <>1 1 (m|c 
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* print! \ only rout i nc visible outride this module 



* getmat! ) trace back best path, count matches: prmtO 

* pralignO — print alignment of described in array p[|: print! ) 

* dumpblockO -- dump a block of lines with numbers, stars: pr align! ) 
10 * numsO - put out a number line: dumpblockO 

* puthncO - put out a line (name, jnum], seq, [num]): dumpblockO 

* stars!) - put a line ot stars: dumpblockO 

* stripname!) -- strip any path and prefix from a seqname 



#i nc hide "nw.h" 



20 



^define SPC 
Shrine P I INF 
^define PSPO 



3 

256 
3 



< * maximum output line */ 

/* space between name or num and seq V 



extern _da\|2o|(26|; 



25 



30 



35 



40 



45 



int 

HI I 

print! i 

{ 



olen; 
Mx; 



int 



/* set output line length */ 
/* output tile * 



Ix, ly. firstgap, lastgap; 



/ + overlap */ 



if ((fx fopen(ofile. "w">» -- - 0) { 

tprintKstderr," S s: can't write SVn". prog, ofilet: 
clcanup( 1 ); 

} 

fprimUfx, "< first sequence: s (length -■ '?d>\rT. namexIO], lenO), 
fprinttdx, "< second sequence: '7 s (length - 1 ( ?dPn". namcx[l]. leni): 
olen - (>0; 
Ix - lenO; 
ly - lenl; 

firstgap -- lastgap 0; 

if (dmax < lenl - 1) { * leading gap in x */ 

pp|()).spc " firstgap ~ lenl - dmax 1; 
ly -- ppHM-spc: 

} 

else if (dmax > lenl - 1 t { /* leading gap m y */ 
pp[l|.spc -" firstgap --- dmax -(lenl - It; 
lx - ■ ppi 1| spc; 



print 
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if ulma\i ) lent* 1 i { * irailing gap in \ 

lastgap - lenO dinaxO I : 
1\ List u.tp. 

} 

eke if oimax'i ■• lenH - 1 1 { * ir.itlitii' gap m v 
lastgap - dmaxO t lenO - 1 ); 
Iv lasttzap. 

} 

gctmatdx. ly. firstgap, lastgap >; 
pr align! t: 



FIGURE 4H 

, ■* 

* (race back the best path, count matches 
*/ 

5 static 

getmat<lx, ly, firstgap, lastgap) getmat 
int lx, !v; /* "core" (minus endeaps) */ 

int firstgap, lastuap; /* leading trailing overlap */ 

{ 

1 () int nm, iO, il, si/0, si/1; 

char outx|32|; 
double pet; 
register nO, nl ; 

register char *p0. *pl ; 

15 

/* get total matches, score 
*/ 

i() i) si/0 - si/1 - 0; 
p() - seqx[ n J + pnl 11 snc 
20 pi - seqx[l) + ppjo] spe; 

"0 = pp[lj spc i 1; 
nl - pp[0].spc t 1; 

nm - 0; 

25 while ( *p0 &k *pl ) { 

if (si.:0) { 

P l t f : 
nl f + ; 

si/0--: 

30 } 

else if (si/1 ) { 

p()+ -t ; 



J>5 



nO + + : 
si/I--; 



else { 



if <xbm|*pO A |&xbm[*pl 'A')> 
inn f t ; 

if (nO f f = pp[0].\[i0|) 
40 si/0 = pp[0J-n[i0 f t 1; 

if (nl + + ■ - ppll] xf 1 1 ] > 

si/1 - pp[ 1 1 n| 1 1 f + ]; 

pO-+ -4 ; 
p 1 + f ; 

45 } 



* pet honn >log\ 

* it penalizing emlgaps, base is the shorter so] 
50 * else, knock off owihangs and lake shorter core 

if (end gaps) 

]\ - ilenO < lenl >'.' IcnO : lenl ; 

else 

55 ix = dx < ly >'■' K ■ ly; 

pet 100 *(doiil)le)nin/((Ioul)le)lx, 
tpnnlti Ix, '" n" i; 

tpnnlfdx, d match'; s m an overlap ot (; r'd: '\ 1\ percent smiilai it> ' n" 
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fprintf\ fx, " < gaps in first sequence: ',7 d". gapx): 
if ( g;ipx) { 

(void) sprintt(outx, " ('aU ( Ts f ?s)", 

ngapx, (Una)? "base": "residue*", (ngapx 

fprintft fx." s", outx); 



.get mat 



10 



15 



20 



25 



fprmtf(fx, gaps in second sequence: ( ^d", gapy); 
if (gapy) { 

(void) sprint f(outx, ** Cvd f vs'?s)". 

ngapy, (dna)? 'base": "residue", Oigapy 

fprimf(fx,"^s", outx); 



} 

if (dna) 



else 



1)'.' "":"s"i; 



tprintf(fx, 

"\n< score: 9f tl (match = 'a' d, mismatch ~ 'Td, gap penalty - ( ? d ■+- (; " d per base) n" 
smax, DM AT, DM1S. D1NS0, DINS1); 



tprmtt(tx. 

"\n< score: l7 ( d (Dayhoff I'AM 250 matrix, gap penalty ' 
smax, HNSO, IMNS1 ); 
if (endgaps) 

tprmttf fx. 

"< endgaps penalized, left endgap: f /l d 7ts7(S, right endgap 
tirstgap, (dna)? t 'base' , : "residue", (firstgap - - 1)7 '"* : 
lastgap, (dna)? 'base" : "residue", (lastgap - - 1)7 *"* : 'V 



\ d \ r i d per residue ) n" 



else 



fprintfd'x, 1L < endgaps not penalized'.!]" K 



35 



40 



static 
static 
static- 
static 
static 
static 

static char 
static char 
static char 
static char 



iini; 
Imax; 

nc[2]; 
m[2]; 

+ PM2]; 
>>[2]: 



i* matches in core -- lor checking */ 

/* lengths of "stripped file names */ 

/* jmp index for a path */ 

/* number at start of current line */ 

/* current clem number for gapping + ' 

/* ptr to current element */ 

/* ptr to next output char slot * ' 



out|2][P UNF]; /* output line */ 
star[P I. INF]; /* set by starsO * 



45 



+ print alignment of described in struct path pp| ] 

+ / 

static 

pi align*) 



pr align 



50 



ml 
nit 

register 



for (i - 0, Imax ■ 0; i < 2: M + ) { 
nn ~ sir ipname( namex|i] >; 
if ( nn > Imax) 

Imax nn: 



nchl 1; 

m|il I' 



• 



10 



FIGURE 4J 

for (nil - nm - 0. more - 1; mure; I { ...pi* align 

for (i - more -- 0; i < 2; i f f ) { 
/* 

* do we have more of this sequence 7 
*/ 

if (!*ps[i}> 

continue; 

mure f -f ; 



if (pp[i].spc) { /* leading space */ 
+ po[i] + + = ' 

15 pp[i]-spc--; 

} 

else if (siz[i|) { /* in a gap */ 
*po[i] + + - 
s./fil-; 

20 } 

else { /* we're putting a seq element 

*/ 

*M>I *psl"l: 
if {isiower(*ps|i|>) 

25 *P^[i] toupperi *ps|i) K 

po[il + +; 
P s[i]+ f ; 



30 * are we at next gap tor this seq? 

*/ 

if (ni|i| = = pp|i].xiu[ijl) { 

i * 

* we need to merge all gaps 
.O * at this location 

*/ 

si/[i] - pp|i) n[i|[il f + j; 
while (ni[i] ■- pp[t] .x[ij[i]J > 

siz|i] + = pp[i|.n[i|[i] + + ] 

40 } 

in [ t ) f \- : 

} 

> 

if ( + f nn ^ olen | | !more && nn) { 
45 dumpbtocki >; 

for (i 0; i ■ 2: i + 1 i 

Mi] <>"'M. 

nn ii; 

} 

50 } 



+ dump a Mock of lines, including numbers, stars: pr alignO 



static 

dimipH.HAi t dumpblock 
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15 



(void) pulc('\n\ fx); 

for (i - 0; i < 2; i f + > { 

if <*oul|i| (*out[il ! ~ ' ' 
if (i - 0) 



nums(i); 
if (i - - 0 && *uut[l]) 
starsO: 

putline(i); 

if ( i = o && *out|lJ) 

fprintflfx, star); 
if (i = 1) 

nums(i); 



.dumpblock 



20 1* 

* put out a number line: dumpblockO 
*/ 

static 



numsd.x) 



ml 



/* mclex in out{| holding seq line */ 



minis 



30 



40 



4S 



char 
register 
register char 



nlme(P_LINI ; |; 
i. j; 

*pn, *px, *py; 



for (pn - nl int-. i --- 0; i < Imax 4 P SPC; i + + . pn t t > 
*pn - 1 '; 

for (i - nc[ix|, py = out[ix); *pv; py f f , pn + t ) { 

' = 1 ' II *py = ■-■> 

+ pn - ' ■; 



if (*py 
else { 



if ( i ''{ 10 - 0 || ii : - I \A: iK'|ix) ' 1)) { 
I - (t < ())•' i : i; 
for (px - pn; j; | / -■■ 10, px--) 
*px = ) (7 < 10 + '(")'; 

if (i < 0) 

♦px - 

} 

else 

*pn ' 



Ml 



*pn '> , 
ik ( i\ ) i; 

for ( pn nline; *pn: pn 1 1 > 
( void) puki *pn. fx ); 
( void ) pulLl " n' . fx >; 



* put mil a line ni.ime, |inim|. seq, [nuin|f: dimipblivku 
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.putline 



10 



15 



register char *px; 

for (px ^ namex[ixj, 1 ^ 0; *px && + px 

(void) putu *px, fx); 
for (: i < hnax + V .SVC; i+ + ) 

(void) putcf ', fx); 

/* these count from 1 : 

* tu[] is current dement (from 1) 

* nc[] is imniber at start of current line 
*/ 

for (px - oul[ix]; + px; px + f) 

(void) putc(*px&0x7F, fx); 
(void) putc(' \n', fx); 



:'; px + t , i + + ) 



1 put a line of stars (se-qs always in out|0]. out[l)): JumpblockO 



25 



static 

stars* ) 



stars 



30 



35 



40 



int i; 

register char *pO, *pl. cx, *px; 

if (!*out|(»l | | ( + nui|0| - -- ■ ' && *(po[0|) ---'' ) | | 
,+ out[l] | | (*out[l) - - ' 1 && *(po[ 1!) " - ' ')) 
return; 

px ~- star; 

for (i = Inux iV SVC; i; i ) 
*px f f ' '; 

for (p() - out[0]. pi out[l); *p0 *pl; pO f + . pi * + I { 
if Osa1pha( + pO> isalpha( *pl )) { 

if (\bin| + pO-'A']&xbm[*pl-'A'l) { 
cx --= "*■; 
nin 4 + ; 



45 



else if (Una && day| + p(» A ]| *pl A ) > 0> 

^ x 

else 



5n 



else 

c x 

*px ; \ cx. 



55 



*px f t : M n' ; 
+ px - ' .(>'; 
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* si rip path or prefix from pn. retain len: pr ali^nl) 
5 static 

Mripname(pn) stripiiame 
char *pn; /* file name (may be path) */ 



10 



25 



35 



40 



45 



Ml 



55 



{ 



register char *px, *py; 



py - 0; 

Tor (px ~- pn; *px; px + + ) 
if (*px - - V) 

py --= px + I ; 

15 if (py ) 

(void) strepyfpn, py); 
retunKstrleii(pnii; 



} 
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25 



35 



40 



45 
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* cleanup!) - cleanup any ttnp tile 

* getseqw -- read in seq, set Una, kn, maxlcn 

* gcallocO - calloc() with error checkin 

* readjmps() -- get the good jmps, from tmp file if necessary 

* wntejnipsO - write a filled array of jmps to a imp file: nw() 
*/ 

^include "nw.h" 
^include < sys/file h > 



char *jname - "/imp/hoingXXXXXX"; 
fill: *fj; 

tut cleanupO; 
long Iseek('); 

/* 

* remove any tmp file if we blow 
cleanup! i) 

i nt i. 

{ 

if (fj> 

(void) unlmki jnainc); 

cxit(i); 

} 

/* 

* read, return ptr to seq. set dna, len. maxlen 

* skip lines starting with ';'.'< or ' > ' 

* seq in upper or lower case 



char * 

get seq (file, len) 
char 



int 



Mile, 

Men; 



{ 



char 

register char 
int 

HI I ; 



/* file name */ 
/* seq len */ 

line[10M}, *pseq; 
+ px, *py; 
natge. tlen; 



/* tmp file for jmps */ 
/* cleanup tmp file */ 



cleanup 



gctseq 



if (cfp - fopem ftle^ D) - - 0) { 

fprintftstderr." f t s: can't read f W n". prog, filet; 
e\it( 1 t; 

} 

tlen natge 0; 

while (tgeMlme. H'M. tpn { 

if (Mine " ':' | | Mine ' ■ ' | | Mine ' ■ M 

continue; 
for (p\ line; *p\ ! ~ n ; px + 4 ) 

if (isuppcrl *px) || ishu\er( *px)> 
tlen + + ; 

} 

if u'p^eq - iualloc((unsi^ne<lMtlen t 6 ) >> ■ ')) { 

fprinttfstdeir." '7 v tnalli \ n tailed (o gel S'd tntes tor ' i < n~ , ptog. tlen + (\ filei 
e\M( 1 i; 



10 



15 



25 
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py ~ pscq f 4; 
*Ien -- lien; 
rewind(fp); 

while (fgets(line t 1024, fp>) { 

if < * 1 i ne - ■= ';' | | *line - ' < ' | | *line = 
continue; 

for (px - line; *px ! =■ \n'; px + + > { 
if (isupptT(*px)) 

*py f f - *px; 
else if (islower( + px» 

*py + + = toupper(*px); 
if (indcx("AT(i( , U",*(p>-Di) 

natgc f f ; 

} 



} 



} 

*py+ + - '\0'; 

*™, - "-O'. 

(void) fclosc(fp); 

dna ^- natgc > (tlen/?); 

ret urnf pscq + 4); 



char * 

g calloc(nisg, nx, sz) 



30 { 



35 



40 } 



char 
hit 

char 



*msg; 
nx, sz; 



/* program, calling routine */ 
/* number and size of elements */ 



*px, *calloc(» 



...getscq 



g^calloc 



if ((px calloc((unsigned)ax, (unsigned )sz» = 0) { 
if <*nisg) { 

fprinlKstderr, u, ts: g callocO failed { \s (n - ( ?d. sz= ( id>.n". prog. msg. nx, sz) 
exit! 1 ); 

} 

} 

return* px i; 



* get final jmps from dx[] or tnip file, set pp[], reset drnax: mam() 
*/ 

45 readjmps( ) readjmps 
{ 

int Id 1; 

Hit m / , ]i l. 1 1 ; 

register i, i. x\; 

50 

if iti) { 

(void) felosett)); 

if ((fd opeiMjname. () KDONLY, On < 0) { 

fpnntfistderr, " '■'< s: can't epenu 'Vs'jf , prog, jnatne); 
55 l leanup( 1 ); 

} 

} 

for n i<> - )I - (t. dmaxO - dinax. xx lent'; ; i i + ) { 



FIGURE 4P 

...rcadjmps 

if ( j < 0 dx[dmax|. offset fj) { 

(void) lseek(fd, dx|dmax|. offset, 0); 
5 (void) read(fd, (char *)&dx[dmax].jp. si/.cof(struct jmp)); 

(\oid) read(fd, (char *>&dx|dmax]. offset, si/eof(dx[dmax]. offset)); 
dx[dmax).ijmp = MAXJMP-1; 

} 

else 

10 break; 
} 

if (i > JMPS) { 

fprintf(stderr, M 9f-s: tcn> many gaps in alignment^", prog); 
ileanup( 1 ); 

15 } 

if <j > - 0) { 

siz ~ dx[dmax].jp n[|); 
xx - dx[dmax].jp.x|j); 
dmax + - siz; 

2() if (siz < °> { '* g :, p in srrnnd set; */ 

pp[l].n[il) - -siz; 
xx f = siz; 

/* id — xx - yv +■ leu 1 - 1 

25 */ 

pp(ll.x[il| xx - dmax f lenl - 1; 
gapy + + ; 
ngapv - - siz; 
/* ignore MAXCJAP when doing endgaps */ 
30 " siz - ( siz < MAXCJAP | | endgaps)? siz : MAXCJAP; 

i! f +; 

} 

else if (siz > 0) { /* gap in first set] */ 
p P I0].n[i0] = siz; 

35 PP[0].x[i()| - xx; 

gapx f f ; 
ngapx f -- siz; 
/* ignore MAXCJAP when doing endgaps */ 

siz - (siz < MAXCJAP | | endgaps) ' siz : MAXCJAP; 
40 i()f+; 

} 

} 

else 

break; 

45 } 

* reverse the nrdrr of jinps 

+ .■ 

for I ) H. [(I- ; | - ](>; , + * , id - > { 

50 i pp(n] n[,]; Pr |<>l n[i| - pp[<>) n[iOj : pp|n| m[i ()| 

' ppM Mil: prl('! -x|j| pp[<>i x|i0|; n>[o).x|io| i ; 

} 

for () 0. i 1 - , j < ll; i + t , il--) { 

i - PPlM n[|]; P p[l] n[)| - pp[I|.n[il|; pplH "[il] " 
55 i - pp|l|.xLt|; ppMl.xli] -= ppIII x[.l|; pp|l|.x|il| - i; 

} 

if (fd • 0) 

( \ oi<] i i ln^ei fd >; 

if (t|) { 
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* write a filled jmp struct offset of the prev one (if any): mw) 

5 */ 



wrilt'jnips(ix) WritejmpS 
int ix; 



{ 

10 



40 



45 



char *mktemp(); 
if(!fj) { 



if (nikteinpfjnarne) < 0; { 

fprintf(siderr, " r ?s: can't mktempO ^s\n", prog, jname); 
cleanup( 1 1; 

15 > 

if <(fj = fopenf jname, "w")) ~ = 0) { 

fprint f (stdcrr. " f ,?s: can't write ( ?s\n", prog, jnainei; 
cxit(l); 

J 

20 } 

(void) f\vrite({ehar *>&dx[ixj. jp. sizeofi struct jmp), 1, tj); 
(void) f\vrite((char * >&dx[ix] offset, sizeoftdx[ix]. offset). K tj); 

} 

2S 



55 



FIGURE 5 



5 

<nvirr(TariA(K';Ao\A(X\A(K^^^^ 

GAAGTG I GCTACA TCI CAGGt TI GGTC ITG TCCrG(TrAnX'ACXnTCXTGGTCXTGATG 
UXTCAlTGGTGArAGACAGGACrAACXTrrc^ 

l'r("iAGTCXX'J"rGC\A1XXjGACi1XXX"CAT(XXTCCXXKX"AA(KX'ATAl*rC'T(jTrGGA'IXjAGC 
10 TrCAGTCXTTACrAGACACXX^^^^^ 

"IT(XTGGC.iAACX , A(-'(iGCXX'T(i(jCX , TrCXTGGTGCTCATGCX TGTGCTGCATGGCAGGAAC 
t IX X.TGCTC 'IT XXX , IT C(XTG( JAG TCCTCXjTGGCXXTrCTGCiC''r(iAC v IT*rGGCCX''r(;GCr 
GTGATCCTGCAGAACATGGCAG(XX\VITGGG^^ 

CA( "iCTC JACX'AACXX it itXSAG 1 GCTCTA I GGAGCCAC V ITTXTTXT C ITTT'CCCTCAATGTG 
! 5 (TGGTGGGTGCC AT( i< iTGG( T\\(XTGGCXjAGTGCTCCTCTCTGCCCTCTACAAC( iCCATC 
<\MXTrGGa\\GATGGACXT(V\GCCTO 
TACTACAGGTACCGAA 



50 



FIGURE 6 

5 CACAACCAGCCACGCGTCTAGGATCCCAGCCCA3GTGGTGGTGGGCTCAGAGGAGAAGGC 
CG'"GTGTTG'j'jAGGAGCC , TGCTTGCGrGGAG'jGACAAGTTTCCGGGAGAGATCA.^TAAAG 
GAAAGGAAAGAGAGAAGGAAGGGAGAGGTGAGGAGAGCGGTTGATTGGAGGAGAAGGGCC 
AGAG.A ATG TCGrCG-:vV3CCAGGAGGG.AAC-:AGACCTCC':CC'3GGGCCACAGAGGACTACT 
CGTATGGCAGGTGGTAGA rCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGG 
10 A.AGTGGCCTCGTGGCACAGGAGCATACCAGCCGGCCTGTACCACGCCTGCCTGGCCTCGC 
rGTCAATCGTTGTGCTGCTGCTCCTGijCCATGCTGGTGAGGOGCCGCCAGCTCTGGCCTG 
ACTGTGTGGGTGGCAGGCCCGGCCTGCCCAGGCCOCGGGCAGTGCCT'30TGCTGTTTTGA 
TGGTCCTCCTGAGCrCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTGGCCTTGCTGA 
CTC1 1 CGC , GTCAGGACCl\\GGC^GATGGG/\AA.ACTGAGGGTCCAAGAGGGGCCTGGAAG^ 

1 5 tactgggactgttctattatgctgcgctctactaccctgtggctgcgtgtgccacggctg 
gccacacaggtggagaggtggtgggcagcagggtgtggtgggggcaccttggggtggagg 
tctggga'jagggcaga'jt'rrccccaggtgocca.^gatctaga.agtactagtccctgctgg 
■:ctcc':tgcgtctg-:tgctgggcctgggattcctga'jCgttt'^gtaccctgtgcagctgg 

1 ■ jAGAA' J<J l l G'AGCGG lmGGACAGGAGCAGGCTCGAAGGGGCTGGAGAGGAGCTAGT "TG 
20 AGGAATATCT'lAGGAAGCTGCTTTGCAGG/\AG.AAG':T':3G'3AAi3CAGCTAGCACACCTCCA 
AGGAr<j"jCTTCCTGT<:=JTGGG':CGGGGTGTGCTTGAGACA':TijCATCTACACTGCACAGC 
GAGGA , rrC':ATGTGC , C'GCTG/\AGGTGGTGCTTTCAGCTACACTGACAGGGA':G'3CCATTT 
AGGAGGTGGCCCTGOTGCTGCTGGTGGGGGTGGTAGGGAGTATCGAGAAGGTGAGGGCAG 
GGGTCACOACGGATGTGTCGTAGGTGGTGGGCGGCTTTG 1 IAAFGGTGGTCTGCGAGGACA 
25 AGCAG'3AGGTGGTGGAGG1^jGTG.^G':ACCATCTGTGGGGTCTG'3AAGTGTGCTACATGT 
GAGCCTTGGTCTTG rCCTGCTTACTGAGGTTGGTGGTCCTGATGGGCTGACTGGTGACAG 
ACA'3GAG':aAGCTTCGAG':TGTGCACGGAGGAGCT'3CCGTGGACTTGAGTCGCTTGCATC 
GG AGTGGCGATCGC rGCGGCCAAJ jGCATATTGTGTTGGATGAGGTTGAGTGCCTAGCAGA 

gag : ::tttat : rggcttggggtggtggtggagcagatgatg rtcttcctgggaaccacgg 
30 :gctggcgttgctggt ^gtoatgcgtgtgctccatgggaggaagctcctgctgttgcgtt 
cgctggag rcctcgtggcccttgtggctgactttggccctgggtgtgatcctggagaaga 
tggcagcgcattgggtottcctggagagtcatgatggacacggagagctgagcaaccggc 
gagtgg'tgtatggagg'-^ggtttctt'jtgttcgcggtg/v^tgtgct'^gtgggtgcgatag 
tgggcaggtgg j ggagtgctggtgtctg'"gtgtag/vv:gggatgcagcttggccagat'3g 
35 agctca(^ggtggtGv:cacggaga'^cgggcagtgT':gacgggg(jGtagtacacgtagcgaa 
agtT'1 , i , i , g j aaga\ttga^i;;tgagggagtgggatccagggatgagagggttctgctgggti:3g 

TGG'TGGAAGCGCAGAr^irCT'l^rTAGCGAGGACCATGG'^i^GGG^GGCAGGAGAGCCT'rAGAG 
CAGGG< V^GGAAGAGGAAGGGATGCAGGTGGTAGAGAGAAA jGAGTGGATGGGGAAGGGAG 

ct agggccggggggag^cg :gggaggggtgggtggggtgtggggtacagggtggtggaoa 
40 acccaaoggtggaggtgttggggaagaggggcgtg ttgggtgggaatggtgcggaggggt 

GAtjGGGAGGGAAGGTGAA'rGGACCTGCGGATCTGTGGTGAGGGATijTTC'rTGCCTACGA': 

• t« •< • ; ■ • •• v- ; • [ • • • x'tgggaggatcacaggaggcatggagggaggaggtggtgg 

GGATGACTGTGGTTGGGTGGAGGTGTGTGTGGAGT "JGGAGGGT AAG 1 ^GGGGTGTGGTCG 

a: % \ • !' i •• ; .tat ;ggagag-:tag' 'aggggtt;t ;gagaaagaaa .tggtgggttag' »g 
45 ■■; :tt >■ • '."A ; \a : \\v vr fgaggga- ;g> ;gagg :agatg ;agg att -\ — :y v ' a; •• ; ' 

TGTG YTCAGCGTT' 1AA< ^JGITGGAT' .3A.\GG '.T P TG'P • FAY ' V "P ' 'A y ' W 

" jAG.r'r^AG':' ttg • ' tt ywggtgtggaaggag ?gaaggga 3tt':'gtgagggggt-?ag 

CGGGAGGGACGTCT :t\X^JGAGTGGGGGGAAAGGTGGGGGGGGTGTGGGGTGGAGGGGAG 
CCGAAGTGATGAGT "A" j ACGAGGTGCGACAGTGAGGTGC AJACACTGGAGAGCGAGATAT 
50 TTTTGTAGl^TTTTATG^^'lG^TGGL'TAlTATG.^AAGAG'J'r'rAG'rGT'JTTCCCTGGAA'rAAA 
CTTGTTCGTGAGAAAAA 



FIGURE 7 

5 

MSSQI'AGNQTS PGATED YS YGSWY 1 DEPQGGEELQPEGEVPSCHTS I PPGLYHACLASL 
SILVLLLLAMLVRRROLWPDCVRGRPGLPRPRAVPAAVFMVLLSSLCLLLPDEDALPFL 
TLASAPSQDGKTEAPROAWK I LGLFY YAALYY PLAACATAGHTAAHLLGSTLSWAHLGV 
QVWQF.AECPQVPK I YK.Y YSLLASLPLLLGLGFLSLWYPVQLVRSFSRRTGAGS KGLQSS 

10 YSEEYLRNLLCRKKLGSSYHTSKHGFLSWARVCLRHCIYTPQPGFHLPLKLVLSATLTG 
TAI YOVALLLLVGWF'TIQK.VRAGVTTDVSYLLAGFGIVLSEDKQEWELVKHHLWALE 
VCYISALVLSCLLTFLVLMRSLVTHRTNLRALHRGAALDLSPLHRSPHPSRQAIFCWMS 
FSAYQTAFICLGLLVQOI I FFLGTTALAFLVLMPVLHGRNLLLFRSLESSWPFWLTLAL 
AVILQNMAAHWVFLETHDGHPQLTNRRVLYAATFLLFPLNVLVGAIVATWRVLLSALYN 

1 5 AIHLGQMDLSLLPPRAATLE'PGYYTYRNFLKIEVSQSHPAMTAFCSLLLQAQSLLPRTM 
AAPQItS LRPGEEDEGMOLLCTKDSMAKGARPGASF.gr ARWGLAYTLLHNPTLQVFRKTA 
LLGANGAQP 

Important f p^hnrpq of the protein: 
20 Signal peptide: 
none 

Transmembrane domain: 

25 54-7] 

9 3 - 1 : i 

14 0-157 
197-114 
2 91 31 2 
50 i56-:-7 1 
4 2 5 4 4 4 

4 64 4 81 

5 0 5- 522 

55 Motif name: N - g 1 yco cy 1 at i on site. 
8 ■ 1 2 

Motif name: N myristoylat ion site. 

40 
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- 2 2 i q 
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Motif name: V i c k a ryot i c membrane lipoprotein lipid attachment site 
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\U >ne 

Ma now Cc 



flon 




Small Skeletal 
Breast Spleen Stomach I hymus Intestine Bmstale Muscle 



I estis Uteit^ 




CD 

Z3 

co 

CO 



O 

E 



r 

o 

o 
O 
c 

Ctf 

£ 
X 



c 
o 

CO 
CO 
CD 

CL 
X 
LLJ 

<r 
21 
en 

CO 
CO 




*: 
c/) 

( ' * - 

I: 



jiK)iji? ( j i)HJi?;; mojj t ,( ;o;)fi^ |tMi!K>|\| 
in u( )jss;>j(ix ] VNH () l ^Ai|r|w} j u< >r;s;M<lx \ |>|<> ) 



(. T) 



777 

77777777777? 



77/ 

77777777777777777777777777777777777777777 



P5 

/ 
/ 
/ 
/ 

/ 
/ 
/ 
/ 
/ 
/ 
/ 

o 7777777777777777777777777777X 

/ o 

77777777777777777777A 
✓ 771 

; 77771 

/ 
/ 
/ 
A 



